Quantifying the Beneﬁts of Nonlinear Methods for Global Statistical Hindcasts of Tropical Cyclones Intensity

While tropical cyclone (TC) track forecasts have become increasingly accurate over recent decades, intensity forecasts from both numerical models and statistical schemes have been trailing behind. Most operational statistical–dynamical forecasts of TC intensity use linear regression to relate the initial TC characteristics and most relevant large-scale environmental parameters along the TC track to the TC intensiﬁcation rate. Yet, many physical processes involved in TC intensiﬁcation are nonlinear, hence potentially hindering the skill of those linear schemes. Here, we develop two nonlinear TC intensity hindcast schemes, for the ﬁrst time globally. These schemes are based on either support vector machine (SVM) or artiﬁcial neural network (ANN) algorithms. Contrary to linear schemes, which perform slightly better when trained individually over each TC basin, nonlinear methods perform best when trained globally. Globally trained nonlinear schemes improve TC intensity hindcasts relative to regionally trained linear schemes in all TC-prone basins, especially the SVM scheme for which this improvement reaches ; 10% globally. The SVM scheme, in particular, partially corrects the tendency of the linear scheme to underperform for moderate intensity (category 2 and less on the Safﬁr–Simpson scale) and decaying TCs. Although the TC intensity hindcast skill improvements described above are an upper limit of what could be achieved operationally (when using forecasted TC tracks and environmental parameters), it is com- parable to that achieved by operational forecasts over the last 20 years. This improvement is sufﬁciently large to motivate more testing of nonlinear methods for statistical TC intensity prediction at operational centers. temperature (T200), boundary layer equivalent potential temperature (E925), and low-level vorticity (Z850). use any predictor as the ocean heat content to represent the effect of the interactions with the ocean on the cyclone intensity


Introduction
Storm surges associated with tropical cyclones (TCs) are a major contributor to casualties and property loss caused by natural disasters in tropical coastal regions (Needham et al. 2015), especially in the Bay of Bengal and the Gulf of Mexico. For instance, TC Nargis caused 140 000 deaths, 1 million homeless, and $1 billion (U.S. dollars) damages in Myanmar in April-May 2008(McPhaden et al. 2009. A timely and accurate prediction of the TC track and intensity is therefore of great importance to allow authorities to take preventive actions, such as evacuating areas under threat. Over the last decades, TC track forecasting has improved significantly due to refined numerical weather prediction models, but accurate TC intensity forecasting is still a challenge (Elsberry et al. 2013;Emanuel and Zhang 2016). Track forecast errors have for instance been reduced by ;66% both in the North Pacific and Atlantic over the past decades (Landsea and Cangialosi 2018), whereas intensity forecasts have improved by only one-third to one-half of this rate (Cangialosi and Franklin 2014;DeMaria et al. 2014). The errors of TC intensity forecasts are indeed still large, reaching from ;5-8 kt (1 kt ' 0.51 m s 21 ) at 12-h lead time to 15-25 kt for 120-h lead time (DeMaria et al. 2014;Cangialosi and Franklin 2014;Bushnell and Falvey 2018;Cangialosi 2019).
Operational centers run a hierarchy of operational TC intensity forecast models that range from fully coupled ocean-atmosphere or atmospheric-only numerical models to statistical-dynamical models to simple statistical prediction schemes (DeMaria 2009;DeMaria et al. 2014). Because of the complex physical processes affecting intensity changes, the very high spatial resolution required and the difficulties in initializing the real-time forecasts, statistical-dynamical TC intensity forecast models have remained competitive with dynamical models (Kucas 2010;DeMaria et al. 2007DeMaria et al. , 2014Kaplan et al. 2015), except in the Atlantic basin where dynamical models outperform statistical-dynamical forecasts for the most recent hurricane seasons (Cangialosi 2019).
Statistical-dynamical models use statistical techniques to relate the future TC intensity changes to the cyclone initial characteristics (current intensity and its time derivative) and to large-scale environmental parameters encountered by the cyclone along its forecast track. Several large-scale environmental parameters indeed have a well-documented effect on the cyclone intensity. Strong vertical wind shear can for instance inhibit a TC intensification (Gray 1968;DeMaria 1996).
The maximum potential intensity (MPI; e.g. , Miller 1958;Emanuel 1995;Holland 1997) increases as a function of the sea surface temperature, and gives an upper bound of the intensity that the cyclone can reach. The midtropospheric relative humidity represents the convection inhibition in dry environments (Emanuel et al. 2004). Other currently used environmental parameters (Emanuel 2007;Knaff et al. 2005;DeMaria et al. 2005) that we also include in the current study are the vorticity at lower levels; air temperature in the upper troposphere and boundary layer equivalent potential temperature.
There are 6 major regions of TC development worldwide: the northwestern Pacific (NWP), the northeastern Pacific (NEP), the southwestern Pacific (SWP), the North Atlantic (ATL), the north Indian Ocean (NIO), and the southern Indian Ocean (SIO) (Fig. 1a). Operational centers worldwide have developed separate statisticaldynamical TC intensity forecast models for various TC basins. The Statistical Hurricane Intensity Prediction Scheme (SHIPS) was initially developed for the Atlantic basin, based on a multiple linear regression (MLR) technique, which relates predictors to the TC intensity change (DeMaria and Kaplan 1994Kaplan , 1999DeMaria et al. 2005). Knaff et al. (2005) later adapted this model for the northwestern Pacific [Statistical Typhoon Intensity Prediction Scheme (STIPS)]. Knaff and Sampson (2009) developed the Southern Hemisphere Statistical Typhoon Intensity Prediction Scheme or SH-STIPS jointly for the southwestern Pacific and southern Indian Ocean basins. Finally, Kotal et al. (2008) adapted STIPS to northern Indian Ocean TCs. Neetu et al. (2017) recently developed a statisticaldynamical hindcast scheme similar to those above (i.e., using an MLR technique) separately for each TC-prone basin, but based on the same set of predictors and datasets for all basins. This set of TC hindcast schemes has mean absolute error (MAE; Fig. 1b) comparable to that of SHIPS, STIPS, and SH-STIPS, with a 20%-40% MAE improvement relative to persistence (Fig. 1c), except in the ATL and NIO (10%-25%; Neetu et al. 2017). A large fraction of this MAE improvement (60%-80%) arises from accounting for the initial TC characteristics (i.e., its intensity at and intensity derivative at and before the beginning of the forecast; Neetu et al. 2017). The environmental parameters that yield the most skill globally are vertical wind shear followed by maximum potential intensity, but with individual contributions that strongly depend on the basin. Neetu et al. (2017) also demonstrated that statistical TC intensity forecasts poorly predict intensity changes of moderate TCs in all basins, with 2-4 times more skillful hindcasts for category 3 and above TCs.
The TC intensity statistical forecasts described above use linear schemes (i.e., they assume a linear relationship between the predictor and the TC intensification rate). There are however nonlinear interactions between TC intensity and environmental parameters, which are not considered in these models (Tang and Emanuel 2012;Lin et al. 2017). Tang and Emanuel (2012) for instance argue that the flux of low-entropy air into the TC center (or midlevel ventilation index) is a major environmental parameter affecting TC intensification. This ventilation index was developed from a theoretical framework, and has a nonlinear dependence on other environmental parameters, as it is formulated as the environmental wind shear multiplied by the nondimensional midlevel entropy deficit divided by maximum potential intensity. Although SHIPS already includes elements of the ventilation index, Tang and Emanuel (2012) suggest that the linear nature of SHIPS does not properly account for the influence of the midlevel ventilation on TC intensification, given that the ventilation index is a nonlinear combination of SHIPS parameters.
Some studies have introduced a nonlinear scaling of some predictors (for instance MPI 2 ) in order to introduce a nonlinear dependence of the cyclone intensity evolution to those predictors (DeMaria and Kaplan 1999;Knaff et al. 2005). It would be difficult and time-consuming to test all possible scaling for each parameter in a linear statistical scheme. Our strategy is thus to account for the nonlinear relationships between all selected variables in a more systematic way in the present study. A relevant option is to use statistical schemes that are designed to capture the nonlinear relationships between variables, without applying any ad hoc scaling on the input parameters. DeMaria (2009) for instance introduce a logistical growth equation model, in which the storm growth is linearly related to vertical shear and vertical stability of the atmosphere but saturates to yield a maximum intensity determined by the MPI. This model is used for the Atlantic and northeast Pacific since 2006, with a skill increase of up to 15% relative to SHIPS for a given season, especially for long lead times (DeMaria 2009). Similarly, Lin et al. (2017) used sparse generalized additive models, which allow nonlinear transforms of predictors, to identify and characterize nonlinear effects of environmental parameters on TC intensification. These models only marginally increase the skill of ATL and NWP TC intensity prediction relative to a linear regression approach.
Artificial neural networks (ANN; e.g., Rumelhart et al. 1986;Fine 1999) and support vector machines (SVM; e.g., Cortes and Vapnik 1995) are two popular statistical nonlinear modeling tools for nonparametric predictions used in many fields, and more specifically for oceanographic, meteorological, and climatic and climate impact studies (e.g., Tolman et al. 2005;Lee 2006;Liu et al. 2010 for ANN ande.g., Elbisy 2015;Aguilar-Martinez and Hsieh 2009;Descloux et al. 2012 for SVM). Both schemes have already been used in relation with TC studies. The ANN and SVM schemes have both been used for predicting rain under TCs (e.g., Lin et al. 2009;Wei 2012), with more skill for SVM-based models (Lin et al. 2009). ANN schemes have also been used to predict TC-induced storm surges (Lee 2009), tropical cyclogenesis (Hennon et al. 2005), and TC tracks (Ali et al. 2007;Roy and Kovordányi 2012). Only a couple of studies have more specifically used SVM or ANN for TC intensity forecasts focusing on the NWP region. Lin et al. (2013) showed that the SVM scheme improved NWP TC intensity forecasts by about 6%-11% over the 2002-09 period, relative to official operational forecasts. Using the same input parameters as in STIPS , Sharma et al. (2013) showed that their nonlinear ANN scheme yielded a 2%-10% skill improvement relative to STIPS for the 2003-04 TC seasons.
The two papers above have suggested the added value of using nonlinear TC intensity forecasts. These papers however focus on the NWP basin using a rather short dataset (;7 years). In this paper, we investigate benefits of nonlinear methods for each individual basin, using the same approach and set of input parameters for consistency, as well as an extended dataset for more robust quantitative results. To that end, we will compare global TC intensity forecasts using the ANN and SVM nonlinear methods to those made with a linear MLR approach very similar to that in Neetu et al. (2017). More specifically, we will investigate if nonlinear models should be trained for each basin individually (as the MLR), or globally; the skill improvement they yield and its basin dependency; and whether this skill improvement depends on the cyclone characteristics (intensity, intensifying/decaying phase) as was the case for the MLR.
We use a very similar MLR to that described in Neetu et al. (2017) as a reference in the current paper and investigate whether nonlinear ANN/SVM models built from the same input parameters bring improvements. The MLR model, its input parameters, and our skill assessment method are summarized in section 2. Section 3 describes the ANN and SVM models. Section 4 compares the performance of these nonlinear models to that of the linear model. Section 5 provides a summary and discussion of our results.

a. Datasets
Building a TC intensity hindcast scheme requires the cyclone positions (used for obtaining the surrounding ''environmental parameters'') and the cyclone intensity, defined from maximum 1-min horizontal winds (which is used for training and verifying the scheme, along with its time derivative). The International Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al. 2010) dataset combines the best track data from several operational centers provides into a single database with TC locations and intensities in all basins. The Joint Typhoon Warning Center (JTWC) has the advantage of using the same track and intensity determination methodology for all basins, and was used wherever it was available (i.e., all basins but NEP and ATL). For NEP and ATL basins, we use the National Hurricane Center track data, which uses the same wind-averaging period of 1 min as JTWC. The TC dataset used in this study includes all best track points whatever their intensity (including tropical storm and depression stages) and their location (including extratropical cases). The number of cases at each forecast interval is provided in Table 1, for each TC basin and globally.
As detailed in section 2b, the TC intensity hindcast is based on documented relations between environmental parameters and the cyclone evolution (e.g., vertical shear in the atmosphere tends to damp cyclones). We use the 1979-2012, 0.758 3 0.758, 6-hourly European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim; Dee et al. 2011) data to determine the atmospheric environmental parameters and Sea Surface Temperature (which is used to estimate MPI) along the cyclone tracks. Globally 2362 TCs occurred during this 34-yr period.
Hindcasts over a shorter period over which reliable estimates of the oceanic state are available due to the availability of satellite altimetry (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) and including the ocean heat content as a predictor however give very similar results to those presented in the current paper (not shown). Following Neetu et al. (2017), MPI is derived empirically for each TC basin, with the same empirical relations generally used in operational models. We use an exponential relation between MPI and sea surface temperature in the NWP, SWP, ATL and SIO basins and a linear relation in the NEP and NIO, in agreement with the previous literature Knaff et al. 2005;Knaff and Sampson 2009;Kotal et al. 2008;Neetu et al. 2017). To account for the inhibiting effect of strong environmental vertical wind shear on the TC intensity, we use the average 200-850-hPa level zonal and total wind shear (USHR and SHRD), averaged over an annular region between 200 and 800 km away from the TC center. Other environmental parameters that affect the TC intensity include relative humidity (RHHI) at midtropospheric level (averaged between 300 and 500 hPa), upper-level temperature (T200 at 200 hPa), and lower-level equivalent potential temperature (E925 at 925 hPa). RHHI, T200 and E925 are also averaged over an annular region between 200 and 800 km from the TC center. We also use relative vorticity at lower levels (Z850, 850 hPa), averaged within 1000 km from the TC center, as in Knaff et al. (2005) and Knaff and Sampson (2009). The environmental variables above are also averaged in time, from the beginning of the hindcast to the lead time for which the TC intensity is estimated.
TC intensity changes from the initial forecast time (DELV, see Table 2) at 12-h intervals up to 120-h lead time are the dependent variables (predictands) in our statistical schemes. Our linear statistical MLR model is similar to that of Neetu et al. (2017) and uses a multiple linear regression technique to relate DELV (the predictand) to all the predictors in Table 2. The data are partitioned into two groups, with 80% of the data being used for training the model and the remaining 20% for testing purposes (computing skill scores). Entire TC tracks are attributed to either the training or testing dataset when randomly generating those datasets. This procedure is repeated 50 times to generate 50 different randomly selected training and testing sets. All skill measures (see section 2c) in the current study are obtained by averaging the skill of these 50 models. This also allows 95% confidence intervals on the skill, using Monte Carlo techniques. Neetu et al. (2017) demonstrated that this MLR was most skillful when trained separately for individual TC basins. In the current study, we either train the MLR (and the two nonlinear schemes) individually on each basin (''regionally trained'') or globally considering all basins together (''globally trained'').

c. Skill assessment
The model performance is assessed using MAE as in many similar studies Knaff and Sampson 2009;Sharma et al. 2013;Neetu et al. 2017). The MAE is defined as the average absolute value of the difference between the predicted TC intensity and the actual TC intensity in the IBTrACS database, across all TCs in the testing database. To ease the comparison between various models, we define the skill as the percentage of improvement in MAE of one model relative to a reference model: Such a skill is of course dependent on the reference used. For instance, in this paper, we aim at estimating the added value of nonlinear schemes (ANN and SVM) relative to the MLR model. In such a case, the MLR will be used as the reference model. But we can also estimate TABLE 2. List of environmental parameters used as predictors of the TC intensity in the current study. Variables in italics (VMAX2 and MPI2) are used only in the multiple linear regression (MLR) model but not in the artificial neural network (ANN) and support vector machine (SVM) models. Variables marked with a * are estimated from an area average between 200 and 800 km of the cyclone position. Variables marked with a ** are estimated from an area average within 1000 km of the cyclone position. The variables marked with a # are time averaged between the initial and the forecast time. DELV is the predictand.  '' model, as in DeMaria et al. 2007;Neetu et al. 2017) that only use the initial cyclone characteristics as predictors (i.e., VMAX, VMAX 2 , and PER for the MLR model).

Development of nonlinear TC intensity hindcasts
The previous section introduced the linear model (MLR) that we will take as a reference in the current study. In this section, we describe the nonlinear schemes that we will evaluate: the ANN (section 3a) and the SVM (section 3b). Since nonlinear schemes are supposed to naturally capture nonlinear dependencies between predictors and the predictand, nonlinear input parameters (VMAX 2 and MPI 2 ) were not included in these nonlinear schemes.

a. ANN scheme
Artificial neural networks are deep-learning algorithms that are used to solve artificial intelligence problems including forecasting and prediction. They can approximate any function (Hornik 1991) and are hence well suited to model nonlinear processes. Typically, the neural network consists of interconnected nodes called neurons, arranged in layers. Each input neuron transmits a linear function (multiplied by a weight with a bias removed) of the input data (predictors) and transmits the signal to next layer (hidden neurons). The weighted sum of inputs is transformed by a nonlinear activation function in the hidden neuron's layer. That process is repeated in each of the following hidden layers and the last neuron layer returns an output (predictand). The ''training dataset'' is used to determine the values of the weigths and biases that minimizes an optimization function (e.g., mean squared error between the computed output and correct outputs).
Our ANN model uses one input layer, one hidden layer, and one output layer. It is based on a hyperbolic tangent activation function. The input layer contains the same number of neurons as there are inputs parameters (i.e., predictors) while the output layer consists of a single neuron that predicts the TC intensification rate at a given lead time. A back-propagation learning algorithm (Reed and Marks 1999) was used to estimate the biases and weights during the training of the ANN model. Mean squared difference between the output and observations is a nonlinear function of the weights and biases, which has several local minima, where the model solution can be trapped. Another problem of nonlinear modeling is overfitting (i.e., when the model performs almost perfectly on the training dataset but is unable to generalize to new situations). To avoid overfitting on our ANN scheme, we use a technique called early stopping. For this, ANN requires three disjoint datasets for training, validation, and testing. The weights and biases (i.e., the parameters that define the neural network) are obtained during the training step, but the iterative procedure is stopped once errors relative to the validation dataset starts increasing. Finally, the testing dataset allows the computation of a skill score on a completely independent dataset to that used during the training/validation. In the following, the training dataset consistently uses a random selection of 60% of the TCs in the full database, while the remaining 40% are split into 20% for validation and 20% for testing purposes. We follow the same procedure as for the MLR and repeat this procedure 50 times with different randomly selected training, validation and testing realizations. We obtain the ANN parameters (weights and biases) values through an ensemble average of those 50 realizations. This procedure minimizes the problems of being trapped in a local minimum while solving the nonlinear problem.
More hidden layers generally improve the performance of the neural network but increase the risk of artificially overfitting. We have built ANN models with 1-10 hidden neuron layers, in order to determine the optimal number of layers for our problem. Figure 2a, displays the average training and testing MAE as a function of the number of hidden neurons. While the MAE in the training dataset monotonically decreases with the increasing number of neurons, it reaches a plateau or even slightly increases beyond about 5-7 neurons for the testing dataset. The ANN architecture therefore uses 7 neurons in the hidden layer throughout the paper, a number similar to that used in Sharma et al. (2013) study (5 neurons). This ANN architecture hence uses 70 adjustable parameters (i.e., weights) compared to 11 for the MLR. As discussed in Ingrassia and Morlini (2007), the equivalent number of degrees of freedom in the ANN is however much smaller than the number of parameters and does not depend on the number of input variables.

b. SVM scheme
SVM is a supervised learning method that stems from statistical learning theory (Vapnik 2000). SVMs were initially developed for classification purposes in the early 1990s, and later on used for regression problems (Vapnik 1995). The SVM technique requires the choice of a kernel function that maps predictors into a higher-dimensional space where they relate linearly to the predictand. The properties of this kernel function are generally dependent on a small number of hyperparameters, that have to be set properly for an optimal performance of the SVM algorithm.
SVM only requires two disjoint sets of data for training and testing. As for the MLR, 80% (20%) of the TCs were used for training (testing). As for the ANN and MLR schemes, SVM models were constructed using 50 randomly selected training/testing datasets and the reported performance is the average over these 50 runs. We chose a radial basis kernel function (Scholkopf et al. 1997) because this widely adopted function (Keerthi and Lin 2003) often outperforms other kernel functions (Ding et al. 2012) and requires a limited number of hyperparameters to be set. The SVM performance depends on the choice of these hyperparameters. To illustrate this aspect, Fig. 2b shows the sensitivity of the SVM predictive skill for different values of the most sensitive hyperparameter (i.e., a smoothing parameter that controls the width of the kernel function and minimizes overfitting). While the MAE monotonically decreases with an increase of this hyperparameter for the training dataset, it only decreases for values up to 10 for the testing dataset. Further increasing the value of this parameter results in a marginal MAE increase for testing, which deviate from the MAE for training, indicating that there is overfitting in this range. This parameter was therefore fixed to 10. The SVM predictive skills are much less sensitive to the choice of the other hyperparameters (not shown).

Results
As described in sections 2 and 3, we have developed three TC intensity hindcast schemes (MLR, ANN, and SVM), including two nonlinear ones (ANN and SVM). These schemes are constructed in a very consistent way, with identical input parameters (Table 2; except that the ANN and SVM scheme do not use the VMAX 2 and MPI 2 input parameters). All these schemes are also constructed using 80% of the TC database (with 60% for training and 20% for validation for ANN), and we show the average skill of 50 models trained with random choices of the trainingand testing databases. This allows a fair comparison between the three schemes.
(Note that we also tested including the quadratic terms in Table 2 in the linear model as one alternative to using a fully nonlinear scheme, but that the ANN and SVM schemes clearly outperform this approach, not shown). Neetu et al. (2017) demonstrated that training an MLR scheme individually for each basin performs better than a globally trained model. This may however not be the case for ANN and SVM. Figure 3 illustrates the benefits of training models globally relative to a regional training (see section 2c) for the three schemes. Regionally trained MLR models slightly outperform globally trained MLR models (;1%-5% on average over all basins; Fig. 3a). This superior performance is seen in all Northern Hemisphere TC-prone basins, but not for Southern Hemisphere TCs. It is particularly large for the NIO at long leads (;20%) but this result may not be very reliable given the small amount of longlived TCs (and TCs in general) in the NIO, and hence the small dataset used to validate the MLR regional model at these extended range (Table 1).
The superior performance of regionally trained models does not hold for nonlinear schemes. Globally trained models behave similarly or slightly outperform the regionally trained ones for both nonlinear schemes. The globally averaged skill improvement for the globally trained ANN model ranges between ;0% and ;3% (Fig. 3b), with strong regional variations. This improvement is larger for the SWP, NIO and SIO basins (;5%-10%), while differences between regional and globally trained ANN are rather modest for the NWP, NEP, and ATL basins The globally trained SVM model also slightly overcomes the regionally trained model, with an improvement of up to 2% globally (Fig. 3c). The regional dependency of this improvement is similar to that of the ANN, with the largest improvement in the SWP, NIO, and SIO (;2%-13%). In contrast, the skill is reduced for the NEP. Overall, the globally trained SVM and ANN perform better than their regionally trained counterparts, in particular in basins with the least cyclones (SWP, NIO, SIO; Fig. 1a). This suggests that the globally trained nonlinear schemes can use data from other basins to improve forecasts in basins with a small training sample. In contrast, the MLR scheme performs better when trained specifically for each basin. In the following, we will thus compare globally trained nonlinear models (which perform best) to the best MLR models (i.e., the regionally trained ones). The added value of using nonlinear models is then evaluated in Fig. 4 by computing the skill improvement of the globally trained ANN and SVM models relative to regionally trained MLR models (see section 2c). Using nonlinear models systematically improves the skill relative to the MLR at all lead times and in all basins (Figs. 4a,b), except at 108 h for the NIO in the case of ANN. Comparing light color bars on Figs. 4a and 4b also indicates that this improvement is systematically larger for SVM than for ANN. The globally averaged skill improvement relative to the MLR ranges between 4% and 7% for ANN (light color bars on Fig. 4a) and between 8% and 12% for SVM (light color bars in Fig. 4b). The additional skill yielded by nonlinear schemes is relatively uniform across TC basins (light color bars in Figs. 4a,b). There is thus a clear benefit of using nonlinear models for TC intensity forecasts, in any TC basin (except the NIO at long lead times for ANN). Neetu et al. (2017) showed that the TC initial characteristics contributed to ;60%-80% of the overall skill in linear models. We will now investigate if this also holds true for our two nonlinear schemes, by comparing the performance of schemes using all predictors to ''baseline models'' using only the TC initial characteristics (see section 2c). Comparing dark and light colored bars in Fig. 4a reveals that the overall improvement brought by ANN and SVM predominantly arises from a better handling of TC environmental parameters, with a far weaker globally averaged improvement when only accounting for TC initial characteristics compared to the improvement when also accounting for environmental parameters. The overall improvement arising from a better handling of TC initial characteristics is however slightly larger for the SVM than for ANN model. Figure 5 allows investigating if the respective contribution of individual parameters is similar for our two nonlinear schemes, by comparing the schemes using all predictors to schemes obtained by excluding each of the environmental parameters. In broad agreement with Neetu et al. (2017), SHRD (;3% contribution, globally) and MPI (;2%) are the environmental parameters that contribute most to the MLR skill globally (Fig. 5a), with very variable contributions depending on the basin. No single variable clearly dominates the performance of nonlinear schemes, and environmental parameters generally all contribute more to the performance than in the MLR. Only two parameters (SHRD and MPI) increase the model performance by more than 2% globally in the MLR, while four variables reach or exceed that threshold for the nonlinear schemes (SHRD, MPI, T200, and E925). SHRD remains the most important environmental parameter globally in the ANN scheme (;3.5%), closely followed by MPI, RHHI, T200, and E925 (all around 2%). E925 is the most important environmental parameter globally in the SVM scheme (;5%), followed by SHRD, MPI, and T200 (all around 3%). The contribution of each predictor also varies considerably basin-wise for nonlinear schemes, but more parameters tend to contribute to the model performance than for the MLR. This suggests that nonlinear schemes are able to extract more information from environmental predictors than linear schemes. Neetu et al. (2017) also demonstrated that an MLR model built considering climatological values for the input environmental parameters performs very similarly to a model constructed from real-time values. While this result implies that the practical implementation of operation TC intensity forecasts can considerably be simplified, it also suggests that the MLR technique only extracts little information from the environmental parameters. To investigate whether nonlinear schemes extract more information, we further compare the performance of experiments using real-time environmental predictors over ones using climatological values (Fig. 6). As in Neetu et al. (2017), Fig. 6a illustrates that the MLR model using real-time predictors does not add much to the model performance (1%-2%), with a slightly larger improvement for Southern Hemisphere TCs (up to 10% for SIO). The use of real-time predictors in nonlinear models results in a larger and more systematic improvement, in all basins and for all lead times (Figs. 6b,c). The globally averaged added value ranges between 5% and 8% for both ANN and SVM (Figs. 6b,c), with a particularly large improvement for Southern Hemisphere TCs (7%-17%). The basin-wise dependency of the added value brought by real-time environmental parameters is quite similar to that of MLR, but it generally larger (Figs. 6b,c versus Fig. 6a). This suggests that ANN and SVM are able to better capture the influence of the nonseasonal variations of atmospheric parameters on the TC intensification rate. Neetu et al. (2017) also demonstrated that MLR hindcasts are 3-4 times more skillful for strong than for moderate TCs. We followed the definition of Neetu et al. (2017) and divided the predictands into two subsets based on the TC intensity at hindcast time. Best track points associated with an intensity smaller than 96 kt (category 2 or below on the Saffir-Simpson scale) are referred to as ''moderate'' (;63% of the dataset), while those exceeding 96 kt (categories 3-5) are considered as ''strong'' (;37% of the dataset). As discussed in Neetu et al. (2017) for the MLR, nonlinear models predict intensity changes of strong TCs better than those of moderate ones (not shown). Figures 7a  and 7b show the SVM and ANN global skill improvement relative to the regionally trained MLR schemes. The ANN scheme has a higher skill than the MLR, but this skill improvement does not particularly depend on whether the cyclone is strong or not. In contrast, the SVM scheme tends to yield a larger skill increase for moderate TCs (21% at 108 h) than for strong TCs (13% at 108 h). The SVM scheme hence partially overcomes the weaker MLR performance for moderate TCs. FIG. 4. Percentage of skill improvement for (a) globally trained ANN and (b) SVM relative to regionally trained MLR for baseline models (i.e., only TC initial characteristics, dark color bars) and full models (i.e., including also environmental parameters along the TC track, light color bars) at 24-, 60-, and 108-h lead times, for each basin and globally averaged. Error bars indicate the 95% confidence interval estimated from a bootstrap method. Lee et al. (2016) did show that linear models tend to have a lower skill for predicting decaying TCs than for intensifying TCs. We hence tested the skill sensitivity to whether the TC is intensifying or decaying. To do so, we divided the predictands into two subsets based on the sign of the TC intensification rate at the beginning of the hindcast. Linear models tend to have a higher skill for intensifying (;40% skill improvement relative to persistence at 24 h) than for decaying TCs (;30%; not shown). Figures 7c and 7d shows the SVM and ANN global skill improvement relative to the regionally trained MLR schemes: nonlinear schemes more specifically outperform linear schemes for decaying TCs. This is particularly striking for the SVM scheme, with a skill improvement of ;7%-10% relative to the MLR for the intensifying phase and 16%-22% for the decaying phase. The SVM in fact yields a similar skill improvement for intensifying or decaying TC relative to performance (46% at 24 h; not shown) (i.e., it corrects the tendency of the MLR to perform less well for decaying TCs).

Conclusions and discussion
Our results indicate that nonlinear schemes (ANN and SVM) systematically improve the skill of TC hindcasts FIG. 6. Added value of using real-time rather than climatological environmental atmospheric parameters for (a) regionally trained MLR, (b) globally trained ANN, and (c) globally trained SVM models, at 24-, 60-, and 108-h lead times, for each basin and globally averaged. This added value is measured as the percentage of skill improvement when using real-time environmental atmospheric parameters against climatological ones in each model. Error bars indicate the 95% confidence interval estimated from a bootstrap technique.
FIG. 5. Respective contributions of each predictor to the overall skill for (a) regionally trained MLR, (b) globally trained ANN, and (c) globally trained SVM for each basin and globally. These contributions are estimated as the percentage of skill reduction at 60-h lead time when excluding all the predictors one by one within each model framework. Error bars indicate the 95% confidence interval estimated from a bootstrap technique. In the MLR scheme, the VMAX 2 (MPI 2 ) predictor is also removed when testing the impact of VMAX (MPI).
relative to the widely used linear models (MLR) by 5%-15%, in all TC-prone basins, for all lead times and all TC categories (except for the NIO at long lead times in the ANN model). These results hence demonstrate that accounting for the nonlinear relationship between input parameters and the TC intensity greatly benefits the hindcast skill of statistical-dynamical forecasts. In addition, the comparison of the ANN and SVM schemes reveals a better performance of SVM, with 10%-15% skill improvement over the MLR globally versus 6%-11% for ANN). The SVM scheme also corrects the tendency for a lower linear scheme skill in predicting moderate (category 2 or below) and decaying TCs intensity evolution. This improved performance of SVM likely results from two desirable mathematical properties of this approach. Contrary to ANN, SVM transforms the nonlinear optimization problem into a linear one, hence avoiding the multiple minima issue and providing a more robust estimate of the global solution. It also uses a more robust error norm, with error rates and model complexity minimized simultaneously, while only the error rate is minimized once the model architecture has been designed for ANN. This usually results in SVM performing better than ANN when moving from the training to the testing dataset (Bisgin et al. 2018).
Our results echo those of the few other studies that compared the performance of linear and nonlinear schemes (DeMaria 2009;Lin et al. 2013;Sharma et al. 2013). For the 2006/07 cyclonic seasons, the nonlinear logistic equation growth model of DeMaria (2009) yields a skill increase from 0% to 15% in the northeast FIG. 7. Globally averaged percentage of skill improvement relative to regionally trained MLR for (a) moderate (i.e., ,96 kt: category 2 and below) and strong TCs (i.e., .96 kt: category 3 or more) and for (b) intensifying (DELV . 0) and decaying TCs (DELV , 0) for globally trained ANN (blue) and SVM (green) schemes at 24-, 60-, and 108-h lead times. Error bars provide the 95% confidence interval estimated from a bootstrap method.

JUNE 2020
N E E T U E T A L .
Pacific and from 0% to 10% in the Atlantic relative to SHIPS, especially at long lead times, with a simpler set of input parameters. Lin et al. (2013) found an improvement of 5%-10% when comparing their nonlinear SVM scheme to operational MLR forecasts in the northwest Pacific over the 2002-09 period. Our results for the NWP indicate an even larger improvement (;8%-16%; see Fig. 4b). However, Lin et al. (2013) did not use the same set of input parameters as the operational forecast, as we do, which makes a quantitative comparison difficult. Sharma et al. (2013) on the other hand used the same parameters as the linear STIPS model ) and found a 3%-10% skill improvement for their nonlinear ANN scheme at all lead times for the 2003-04 TC seasons in the NWP. We find a similar improvement (5%-11%, Fig. 4a) over the longer 1979-2012 period. By applying nonlinear TC intensity forecasting methods to all TC basins globally, the present study extends previous results obtained over a limited set of basins (NWP and ATL) and limited period (;7 years vs to 34 years here), and demonstrates the added value of using nonlinear schemes for TC intensity prediction for all basins, lead time, and TC intensity. This improvement arises from the nonlinear relationship between the environmental parameters and intensity of the cyclone, which are more easily captured by nonlinear schemes. Sensitivity experiments using environmental predictors calculated from climatological data (rather than real-time values) indicate that nonlinear schemes make a far better use of the nonseasonal variations of these environmental predictors. Accounting for these real-time variations indeed yield a far larger improvement (5%-8% for ANN and SVM, globally) than the linear scheme (;1%-3%).
Finally, our results also demonstrate that the SVM scheme is particularly efficient at improving the hindcast skill of moderate TCs (category 2 and below), which were particularly poor for linear models (Neetu et al. 2017). Similarly, the SVM scheme corrects the tendency of linear models to have a lower skill for decaying than for intensifying TCs (Lee et al. 2016). This larger improvement for moderate TCs (which represent 63% of TCs worldwide) and decaying TCs is a strong incentive for using SVM TC intensity prediction schemes operationally.
Our results indicate that nonlinear schemes have a strong potential to improve statistical TC intensity forecasts, especially SVM schemes. The best TC intensity forecasts have improved by ;8%-15% over the last decade, both in dynamical and linear statisticaldynamical models (DeMaria et al. 2014;Emanuel and Zhang 2016). The skill gains from using nonlinear schemes are thus comparable to the TC intensity forecast skill gain over a decade. While we expect a skill degradation for operational application, due to use of forecasted track and environment, rather than reanalyzed environmental parameters along the observed track, the TC intensity skill improvement expected from the present results are sufficiently large to motivate more trials in operational mode.
Finally, the good performance of global training for nonlinear schemes also has interesting applications. The first is that a single system can be built and applied to various basins, in contrast with current statisticaldynamical schemes, which are generally built for a single or a couple of TC basins. This technique also offers great hope in basins such as the northern Indian Ocean, where the limited number of TCs prevents an efficient training of MLRs, yielding poor skill. The ability of the nonlinear schemes to be trained globally takes advantages of the numerous cyclones in other basins, yielding a skill improvement of up to 10% relative to the MLR at 108-h lead time in the Northern Indian Ocean for the SVM scheme (Fig. 4b). This offers great promise in this basin that just represents 5% of the TCs globally, but almost 80% of the casualties they cause (Needham et al. 2015).